Skip to content

20099: Add Decimal support for floor preimage#224

Open
martin-augment wants to merge 4 commits intomainfrom
pr-20099-2026-02-02-08-18-45
Open

20099: Add Decimal support for floor preimage#224
martin-augment wants to merge 4 commits intomainfrom
pr-20099-2026-02-02-08-18-45

Conversation

@martin-augment
Copy link
Owner

20099: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the floor function's optimization capabilities by introducing support for Decimal data types in its preimage calculation. This change allows the query optimizer to convert floor(decimal_column) = N conditions into more performant range filters, such as decimal_column >= N AND decimal_column < N + 1. The implementation includes a new utility function for decimal bound computation and is thoroughly validated with both unit tests and SQL logic tests to ensure correctness and proper query plan generation.

Highlights

  • Decimal Support for Floor Preimage: Extended the floor function's preimage optimization to support all Decimal data types (Decimal32, Decimal64, Decimal128, Decimal256). This allows predicates like floor(decimal_col) = N to be transformed into efficient range scans (decimal_col >= N AND decimal_col < N + 1).
  • New decimal_preimage_bounds Function: Introduced a new helper function decimal_preimage_bounds to calculate the lower and upper bounds for the floor preimage of decimal values, handling fractional parts (returning None) and potential overflow during bound calculation.
  • Comprehensive Unit Tests: Added extensive unit tests for the new decimal preimage logic, covering valid integer-like decimal inputs, non-integer decimals, overflow scenarios, edge cases (min/max values), and null inputs across all Decimal types.
  • SQL Logic Test Verification: A new SQL logic test file (floor_preimage.slt) was added to verify the optimization in the query planner for Decimal128, alongside existing Float64 and Int32 tests. This includes EXPLAIN plans to confirm the predicate transformation for various comparison operators.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

Walkthrough

This change adds support for decimal types (Decimal32, Decimal64, Decimal128, Decimal256) to the floor preimage optimization logic in DataFusion. A new helper function decimal_preimage_bounds computes preimage ranges for decimal values with proper handling of precision, scale, and overflow. The core floor matching logic is extended to handle decimal scalar values. Additionally, a new sqllogictest file verifies the floor preimage transformation works correctly across different data types, comparison operators, and edge cases, including NULL handling and non-optimizable predicates.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20099-2026-02-02-08-18-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 2, 2026

Code Review: Add Decimal support for floor preimage

Summary

This PR adds decimal type support (Decimal32, Decimal64, Decimal128, Decimal256) to the floor function's preimage optimization. The implementation looks solid overall with comprehensive test coverage. Here are my findings:


✅ Strengths

  1. Comprehensive Test Coverage: Excellent unit tests covering all decimal types with edge cases (overflow, null, non-integer, min/max values)
  2. Consistent Pattern: The implementation follows the existing pattern for float and integer types
  3. Good Documentation: The SQL logic tests are well-documented and cover various comparison operators
  4. Correctness: The decimal preimage logic correctly implements the mathematical constraint that floor always returns integers

🔍 Code Quality & Best Practices

1. Import Organization (datafusion/functions/src/math/floor.rs:22-27)

The imports are clean, but consider alphabetical ordering within groups:

use arrow::compute::{DecimalCast, rescale_decimal};
use arrow::datatypes::{
    ArrowNativeTypeOp, DataType, Decimal32Type, Decimal64Type, Decimal128Type,
    Decimal256Type, DecimalType, Float32Type, Float64Type,
};

2. Potential for Code Deduplication (datafusion/functions/src/math/floor.rs:262-301)

The four decimal type cases are nearly identical. While the current approach is clear and explicit, consider if a macro or generic helper could reduce repetition. However, the current explicit approach is acceptable for maintainability.


🐛 Potential Issues

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)

if scale > 0 && value % one_scaled != D::Native::ZERO {
    return None;
}

Concern: This check only applies when scale > 0, but what about when scale < 0? According to SQL standards, decimal types can have negative scales (e.g., DECIMAL(10, -2) means rounding to hundreds).

Questions:

  • Does Arrow support negative scales for decimals?
  • If so, should the fractional check be skipped or handled differently?
  • The test cases only use scale >= 0, so this edge case is untested

Recommendation: Either:

  • Add a comment explaining that negative scales are not supported/expected
  • Add validation or handling for negative scales
  • Add test coverage for scale = 0 and scale < 0 cases

2. Error Handling in rescale_decimal (datafusion/functions/src/math/floor.rs:368-373)

let one_scaled: D::Native = rescale_decimal::<D, D>(
    D::Native::ONE,
    1,
    0,
    precision,
    scale,
)?;

Observation: rescale_decimal returns Option, which is propagated with ?. This is correct, but it's worth documenting when this can fail:

  • Invalid precision/scale combinations
  • Overflow during scaling

Recommendation: Add a comment explaining failure conditions for future maintainers.


⚡ Performance Considerations

  1. Modulo Operation: The fractional part check uses modulo (value % one_scaled), which is generally efficient for integer types. No concerns here.

  2. rescale_decimal Call: This is called once per preimage computation, which is acceptable since this is an optimization path that runs during query planning, not query execution.


🔒 Security Concerns

No security issues identified. The implementation:

  • Uses checked arithmetic (add_checked) to prevent overflow
  • Properly handles null values
  • Returns None for invalid inputs rather than panicking

🧪 Test Coverage

Excellent coverage with the following test categories:

  • ✅ Valid cases for all decimal types
  • ✅ Non-integer values (should return None)
  • ✅ Overflow cases
  • ✅ Edge cases (min/max values)
  • ✅ Null handling
  • ✅ All comparison operators (=, <>, <, >, <=, >=)
  • ✅ IS DISTINCT FROM / IS NOT DISTINCT FROM

Potential additions:

  1. Test with scale = 0 explicitly (currently in edge cases but worth a dedicated test)
  2. Test with very high scales (e.g., scale = 38 for Decimal128)
  3. Integration test showing the optimization actually improves query performance

📝 Documentation & Comments

Good:

  • Function documentation is clear
  • Test comments explain the mathematical constraints

Could improve:

  • Add inline comment explaining why fractional values return None (datafusion/functions/src/math/floor.rs:378)
  • Document the rescale_decimal approach in the function comment

🎯 Specific Recommendations

  1. Address the scale < 0 edge case or add documentation

  2. Consider adding a comment in decimal_preimage_bounds explaining:

    /// Compute preimage bounds for floor function on decimal types.
    /// For floor(x) = n, the preimage is [n, n+1).
    /// 
    /// Returns None if:
    /// - The value has a fractional part (floor always returns integers)
    /// - Adding 1 would overflow
    /// - rescale_decimal fails (invalid precision/scale)
    
  3. Minor: Test naming consistency - some tests use decimal prefix, others don't. Consider consistent naming like:

    • test_floor_preimage_decimal32_*
    • test_floor_preimage_decimal64_*
      etc.

✅ Final Verdict

This is a high-quality PR that correctly implements decimal support for floor preimage optimization. The code is well-tested, follows existing patterns, and handles edge cases properly.

Recommended Action: Approve with minor suggestions for documentation improvements and clarification on negative scale handling.

Nice work! 🎉


Minor Nits

  • Line 262-301: Consider adding blank lines between different decimal type cases for readability
  • Test file header (datafusion/sqllogictest/test_files/floor_preimage.slt:18-25): Well-structured and informative

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Decimal types to the floor function's preimage optimization, which is a great enhancement. The implementation looks correct, and the added unit tests and sqllogictest are comprehensive.

I have a few suggestions to improve code clarity and maintainability, mainly around reducing code duplication in the preimage function and improving the clarity of some test cases by using more idiomatic arithmetic operators. Overall, this is a solid contribution.

Comment on lines +262 to +302
// Decimal types
ScalarValue::Decimal32(Some(n), precision, scale) => {
decimal_preimage_bounds::<Decimal32Type>(*n, *precision, *scale).map(
|(lo, hi)| {
(
ScalarValue::Decimal32(Some(lo), *precision, *scale),
ScalarValue::Decimal32(Some(hi), *precision, *scale),
)
},
)
}
ScalarValue::Decimal64(Some(n), precision, scale) => {
decimal_preimage_bounds::<Decimal64Type>(*n, *precision, *scale).map(
|(lo, hi)| {
(
ScalarValue::Decimal64(Some(lo), *precision, *scale),
ScalarValue::Decimal64(Some(hi), *precision, *scale),
)
},
)
}
ScalarValue::Decimal128(Some(n), precision, scale) => {
decimal_preimage_bounds::<Decimal128Type>(*n, *precision, *scale).map(
|(lo, hi)| {
(
ScalarValue::Decimal128(Some(lo), *precision, *scale),
ScalarValue::Decimal128(Some(hi), *precision, *scale),
)
},
)
}
ScalarValue::Decimal256(Some(n), precision, scale) => {
decimal_preimage_bounds::<Decimal256Type>(*n, *precision, *scale).map(
|(lo, hi)| {
(
ScalarValue::Decimal256(Some(lo), *precision, *scale),
ScalarValue::Decimal256(Some(hi), *precision, *scale),
)
},
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These match arms for Decimal32, Decimal64, Decimal128, and Decimal256 are very repetitive. This code duplication could be reduced to improve maintainability.

Consider defining a local macro within the preimage function to handle the repetitive logic. For example:

macro_rules! handle_decimal {
    ($T:ty, $n:expr, $p:expr, $s:expr, $scalar_variant:path) => {
        decimal_preimage_bounds::<$T>(*$n, *$p, *$s).map(|(lo, hi)| {
            ($scalar_variant(Some(lo), *$p, *$s), $scalar_variant(Some(hi), *$p, *$s))
        })
    };
}

Then each match arm could be simplified to a single line, e.g.:
ScalarValue::Decimal32(Some(n), precision, scale) => handle_decimal!(Decimal32Type, n, precision, scale, ScalarValue::Decimal32)

Since adding the macro definition is outside the changed lines, I'm not providing a direct code suggestion, but this refactoring would make the code more concise.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The match arms are identical and differ only in the type of the handled Decimal type. Prevents code duplication and double maintenance.


// Decimal256: i256::MAX
assert_preimage_none(ScalarValue::Decimal256(
Some(i256::MAX.wrapping_sub(i256::from(50))),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using wrapping_sub here can be misleading as the subtraction is not expected to wrap. Using the standard subtraction operator - would be clearer and more accurately reflect the test's intent, as i256 implements Sub.

Suggested change
Some(i256::MAX.wrapping_sub(i256::from(50))),
Some(i256::MAX - i256::from(50)),

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Sub for i256 in Apache Arrow

assert_preimage_range(
ScalarValue::Decimal256(Some(large_256), 76, 2),
ScalarValue::Decimal256(Some(large_256), 76, 2),
ScalarValue::Decimal256(Some(large_256.wrapping_add(i256::from(100))), 76, 2),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency with decimal_preimage_bounds which uses add_checked, it's better to use the + operator here instead of wrapping_add. Since this test case is for non-overflowing values, + better reflects the intent.

Suggested change
ScalarValue::Decimal256(Some(large_256.wrapping_add(i256::from(100))), 76, 2),
ScalarValue::Decimal256(Some(large_256 + i256::from(100)), 76, 2),

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Add for i256 in Apache Arrow

assert_preimage_range(
ScalarValue::Decimal256(Some(neg_256), 76, 2),
ScalarValue::Decimal256(Some(neg_256), 76, 2),
ScalarValue::Decimal256(Some(neg_256.wrapping_add(i256::from(100))), 76, 2),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous case, using + instead of wrapping_add would be clearer here, as this is a non-overflowing test case.

Suggested change
ScalarValue::Decimal256(Some(neg_256.wrapping_add(i256::from(100))), 76, 2),
ScalarValue::Decimal256(Some(neg_256 + i256::from(100)), 76, 2),

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! There is no implementation of std::ops::Add for i256 in Apache Arrow

@augmentcode
Copy link

augmentcode bot commented Feb 2, 2026

🤖 Augment PR Summary

Summary: Extends the floor scalar UDF preimage logic to support Arrow Decimal types so predicate simplification can be applied to decimal comparisons.

Changes:

  • Adds Decimal32/64/128/256 handling in FloorFunc::preimage to derive [N, N+1) bounds.
  • Introduces a shared decimal_preimage_bounds helper that computes “+1” at the correct scale via rescale_decimal and uses checked arithmetic for overflow safety.
  • Adds comprehensive unit tests covering valid decimal ranges, non-integer decimal literals, null literals, and boundary conditions.
  • Adds a new SQLLogicTest file validating both correctness and EXPLAIN-plan rewrites for Float64/Int32/Decimal predicates.

Technical Notes: Decimal preimage optimization is only applied when the compared literal is an integer value at its declared scale and N + 1 can be represented without overflow.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


// Decimal32: i32::MAX
// For scale=2, we add 100, so i32::MAX - 50 would overflow
assert_preimage_none(ScalarValue::Decimal32(Some(i32::MAX - 50), 9, 2));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These scale=2 “overflow” cases (e.g. i32::MAX - 50) aren’t aligned to the scale factor (100), so preimage returns None due to the “non-integer decimal” check rather than actually exercising the add_checked overflow path.

Other Locations
  • datafusion/functions/src/math/floor.rs:684
  • datafusion/functions/src/math/floor.rs:688
  • datafusion/functions/src/math/floor.rs:692

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! i32::MAX - 50 gives 2147483597, which has 10 digits and does not fit into precision=9 at all. So, the decimal number is overflown by construction and there is no call to floor() at all

@martin-augment
Copy link
Owner Author

1. Fractional Part Check Logic (datafusion/functions/src/math/floor.rs:378-379)

if scale > 0 && value % one_scaled != D::Native::ZERO {
    return None;
}

Concern: This check only applies when scale > 0, but what about when scale < 0? According to SQL standards, decimal types can have negative scales (e.g., DECIMAL(10, -2) means rounding to hundreds).

Questions:

  • Does Arrow support negative scales for decimals?
  • If so, should the fractional check be skipped or handled differently?
  • The test cases only use scale >= 0, so this edge case is untested

Recommendation: Either:

  • Add a comment explaining that negative scales are not supported/expected
  • Add validation or handling for negative scales
  • Add test coverage for scale = 0 and scale < 0 cases

value:useful; category:bug; feedback: The Claude AI reviewer is correct! Apache Arrow supports negative scales for decimals, so this check won't detect such decimals with fractional part. Prevents a bug with decimals with negative scales. Also shows that there are no unit/slt tests for such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant